# Extracting Drug Information in `detail_step ` ## Description In this tutorial, we will guide you through a detailed example, showing you how to configure Jexter to extract drug names, approval numbers, company names, attachment links, and drug reference information. ## Detailed Case Explanation Let's assume you have a web page with the following structure, which contains drug information: ```html
Drug A [Batch 123456] Company X
Drug B [Batch 789012] Company Y
Attachment 1 Attachment 2

This is a detailed reference for Drug A.

This is a detailed reference for Drug B.

``` We will extract the following details: - Drug Name (drug_name) - Approval Number (auth_num) - Company Name (company) - Attachment Links (attachments) - Drug Reference Information (drug_reference) ## Jexter Configuration: Here's a Jexter configuration file that targets the example HTML structure: ```json { "elements": { "drug_name": { "col": "//td[@class='drug-name']" }, "auth_num": { "col": "//td[@class='approval-number']", "function": { "regexp": "\\[Batch (\\d+)\\]", "type": "string", "return": [1] } }, "company": { "col": "//td[@class='company-name']" }, "attachments": { "innerHtml": "//div[@class='attachments']/a", "extract_attachments": {} }, "drug_reference": { "innerHtml": "//div[@class='reference']/p" } } } ``` ## Explanation of Configuration - `drug_name`: Uses XPath to select the `` element with the class `drug-name` to extract the drug name. - `auth_num`: Uses XPath to select the `` element with the class `approval-number` and applies a regular expression to extract the approval number. The `return` array specifies the first capturing group, which is the number after "Batch". - `company`: Uses XPath to select the `` element with the class `company-name` to extract the company name. - `attachments`: Uses `innerHtml` to select the `` elements within the `
` with the class `attachments`. The `extract_attachments` field is used to process the extracted links. - `drug_reference`: Uses `innerHtml` to select the `

` elements within the `

` with the class `reference` to extract the drug reference information. This tutorial has provided guidance on using Jexter to extract drug information from web pages. After configuring your extraction rules, it's critical to review the output to ensure it aligns with your expectations. If discrepancies are identified, fine-tune the XPath expressions and regular expressions to more accurately mirror the structure of the target web page before saving your settings. This step is pivotal before proceeding to the [last study step](Study:%20attachment_step.md), ensuring the integrity and accuracy of the data collection process.